# Reinforcement learning optimization
Polaris 4B Preview F32 GGUF
Apache-2.0
Polaris is an open-source post-training method that uses reinforcement learning to optimize and enhance the model and improve inference capabilities.
Large Language Model
Transformers English

P
prithivMLmods
765
1
Longwriter Zero 32B I1 GGUF
Apache-2.0
The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.
Large Language Model
Transformers Supports Multiple Languages

L
mradermacher
135
1
Longwriter Zero 32B GGUF
Apache-2.0
The LongWriter-Zero-32B Quantized Model is a multilingual model that undergoes static quantization based on the original model. It is suitable for long context scenarios such as reinforcement learning and writing.
Large Language Model
Transformers Supports Multiple Languages

L
mradermacher
204
1
Acereason Nemotron 1.1 7B GGUF
Other
A high-performance 7B parameter language model launched by NVIDIA, focusing on mathematical and code reasoning tasks and supporting a 128k context length.
Large Language Model Supports Multiple Languages
A
lmstudio-community
278
1
Kimi Dev 72B
MIT
Kimi-Dev-72B is an open-source large coding language model for software engineering tasks, achieving the best results among open-source models on SWE-bench Verified.
Large Language Model
Transformers Other

K
moonshotai
324
162
Contentv 8B
Apache-2.0
ContentV is an efficient video generation model framework that achieves high-quality video generation with limited computing resources through a minimalist architecture, multi-stage training strategy, and cost-effective reinforcement learning framework.
Video Processing
C
ByteDance
417
25
Qwenlong L1 32B
Apache-2.0
QwenLong-L1 is a long-context reasoning model trained with reinforcement learning, demonstrating excellent performance across seven long-context document QA benchmarks.
Large Language Model
Transformers

Q
Tongyi-Zhiwen
683
106
Verireason Codellama 7b RTLCoder Verilog GRPO Reasoning Tb
VeriReason is a Verilog RTL code generation method that combines reinforcement learning with testbench feedback, significantly improving the performance of pre-trained models in the field of hardware design.
Large Language Model
Transformers

V
Nellyw888
1,483
1
INTELLECT 2 GGUF
Apache-2.0
INTELLECT 2 is a large language model launched by PrimeIntellect, supporting a context length of 40960 tokens, trained using the QwQ architecture and GRPO reinforcement learning framework.
Large Language Model
I
lmstudio-community
467
5
Deephermes Financial Fundamentals Prediction Specialist Atropos
This is an experimental financial analysis model optimized for financial fundamentals prediction through the Atropos reinforcement learning framework
Large Language Model
Transformers English

D
NousResearch
52
5
Tinyv 1.5B
Apache-2.0
Fine-tuned based on the Qwen/Qwen2.5-1.5B-Instruct model, using the TinyV reward system, which can provide more accurate reward signals in the post-training of efficient reinforcement learning (RL) and significantly improve RL efficiency and the performance of the final model.
Large Language Model
Transformers

T
zhangchenxu
1,124
1
Unt 8b
Apache-2.0
The Camel Model is a text generation model based on the transformer architecture, supporting Azerbaijani and trained using reinforcement learning.
Large Language Model
Transformers Other

U
omar07ibrahim
33
2
Community Request 01 12B
A pre-trained language model merged from multiple Captain-Eris series models using the mergekit tool
Large Language Model
Transformers

C
Nitral-AI
19
3
STILL 3 1.5B Preview
STILL-3-1.5B-preview is a slow-thinking model enhanced with reinforcement learning technology, achieving 39.33% accuracy on the AIME benchmark
Large Language Model
Transformers

S
RUC-AIBOX
2,186
10
Codet5 Large Ntp Py
Bsd-3-clause
CodeT5 is a large-scale encoder-decoder model pre-trained with NTP objectives for Python language, focusing on code understanding and generation tasks
Large Language Model
Transformers

C
Salesforce
217
27
Ppo BreakoutNoFrameskip V4
This is a reinforcement learning agent based on the PPO algorithm, specifically designed for training and evaluation in the BreakoutNoFrameskip-v4 game environment.
Image Generation
P
sb3
22
0
Featured Recommended AI Models